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We investigate the extent to which the commonly used standard pairwise contact po- 
tential can be used to identify the native fold of a protein. Ideally one would hope that a 
universal energy function exists, for which the native folds of all proteins are the respective 
ground states. Here we pose a much more restricted question: is it possible to find a set 
of contact parameters for which the energy of the native contact map of a single protein 
(crambin) is lower than that of all possible physically realizable decoy maps. We seek such 
a set of parameters by perceptron learning, a procedure which is guaranteed to find such a 
set if it exists. We found that it is extremely hard (and most probably, impossible) to fine 
tune contact parameters that will assign all alternative conformations higher energy than 
that of the native map. This finding clearly indicates that it is impossible to derive a general 
pairwise contact potential that can be used to fold any given protein. Inclusion of additional 
energy terms, such as hydrophobic (solvation), hydrogen bond or multi-body interactions 
may help to attain foldability within specific structural families. 



Keywords: 
ceptron. 



protein folding; protein folding potential; contact map; neural networks; per- 



I. INTRODUCTION 

Nearly all the important biochemical tasks of or- 
ganisms such as catalytic activity, molecular recog- 
nition and transmission of signals are performed by 
proteins jjj . The biological function of these macro- 
molecules is determined by the specific shapes into 
which they fold under physiological conditions. The 
blueprint for the protein's conformation is its chem- 
ical composition, e.g. amino acid sequence. The 
central problem of protein folding |l[] is to pre- 
dict proteins' native structures from their amino 
acid sequences; solution of this problem will have 
a formidable impact on molecular biophysics and 
drug design. At present, genome projects have made 
available the sequences of hundreds of thousands 
of proteins Q. The full potential of this achieve- 
ment will be realized only when we are able to rou- 
tinely translate the knowledge of a sequence of a 
protein into the prediction of its shape and function. 
Moreover, since by using powerful recombinant DNA 
techniques || we can now create proteins with any 
pre-designed amino acids sequence, it will be possi- 
ble to create synthetic proteins with entirely novel 
functions. 

A conceptually straightforward attempt to solve 
the problem is to construct, for any given molecule, 
an energy function using the inter-atomic potentials 
and look for energy minima. Alternatively, one can 
use molecular dynamics, e.g. work at an energy cor- 



responding to kT and integrate Newton's equations. 
Such a direct attack on the problem lies beyond the 
possibilities of existing computers, partly because of 
the large number of atoms that comprise a single 
protein and partly because the exact potential is not 
known (we are looking for a classical effective inter- 
action between ions and atoms; furthermore, folding 
takes place in the presence of water and the wa- 
ter molecules must be "integrated out"). This state 
of affairs points to a need for approximate, coarse 
grained or reduced representations of protein struc- 
ture and derivation of corresponding energy func- 
tions. 

A minimalistic representation of a protein's struc- 
ture is given by its contact map @-||]. The contact 
map of a protein with TV residues is a TV x N matrix 
S, whose elements are defined as 



Sij — 



if residues i and j are in contact 
otherwise 



(1) 



One can define contact between two residues in dif- 
ferent ways. In this work, we will consider two amino 
acids in contact when their two C a atoms are closer 
than 8.5 A m. Given all the inter-residue contacts 
or even a subset of them, it is possible to reconstruct 
quite well a protein's structure, by means of cither 
distance geometry Molecular Dynamics Jl(| or 
Monte Carlo [§. 
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In contrast to Cartesian coordinates, the map rep- 
resentation of protein structure is independent of the 
coordinate frame. This property made contact maps 
attractive for protein structure comparisons and for 
searching a limited database for similar structures 
A more challenging possibility was proposed 
recently 0: to use the contact map representation 
for folding, e.g. to search the space of contact maps 
for the map that corresponds to the native fold. The 
central premise of this program is that the contact 
map representation has an important computational 
advantage; changing a few contacts in a map induces 
rather significant large-scale coherent moves of the 
corresponding polypeptide chain |ll|] . The proposed 
program faces, however, three considerable difficul- 
ties: 

1. One needs an efficient procedure to execute 
these non-local moves 

2. There must be a way to test that the resulting 
maps correspond to physically realizable con- 
formations and 

3. One should construct a reliable (free) energy 
function, defined in contact map space, such 
that low-energy maps can be used to identify 
the native one. 

We made considerable progress jll] on the first of 
these problems and have found an efficient method 
to solve the second [||. In this study we present 
some new questions that are relevant to the third 
issue, of identifying a reliable energy function. We 
also introduce a suitable methodology to address the 
questions raised. 

The most commonly used energy function for 
threading sequence a into a fold whose contact map 
is S has the form 



E(sl, S,w) = S t jw(a i ,a J ) 



(2) 



The 210 parameters w(ai,aj) represent the energy 
gained by bringing amino acids <Zj and aj in contact. 

Of the two main methods that have been used 
in the past to derive contact energy parameters, 
knowledge-based techniques were the first to be pro- 
posed. These methods rely on the quasi-chemical ap- 
proximation [|l2] |l6| to derive contact energies from a 
statistical analysis of known protein structures. Al- 
though suitable for more limited purposes, such as 
fold recognition fl7|| or threading fl2|| , energy param- 
eters obtained this way have failed, so far, to produce 
acceptable maps by energy minimization (sometimes 
referred to as ab initio folding). 



More recently, much attention has been devoted 
to a new class of potentials, derived by optimization 
|§^3[. For example Mirny and Shakhnovich Q 
determine the contact energy parameters by mini- 
mizing simultaneously the Z-score of the native maps 
of a large set of proteins. Hao and Scheraga [^(|, us- 
ing a much more detailed representation, tried to 
find energy parameters for which the native confor- 
mation has the lowest energy for a single protein. 

In this work we address the same problem as Hao 
and Scheraga, but use the contact representation. 
That is, we ask whether it is possible to find con- 
tact energy parameters, such that among all physical 
maps for a particular it single protein, the energy of 
the native map is the lowest? In more detail, one 
requires that 



£(a,S ,w) < £(a,S c ,w). 



(3) 



That is, the parameters w should be such that when 
the sequence a is threaded into any physical non- 
native contact map S c , the resulting energy should 
be higher than that of the native map So. 

Asking the question posed above in the contact 
energy representation has a distinct advantage over 
other potentials, since in our case the energy is a 
linear function of the parameters w. Therefore once 
a large library of candidate maps S c has been gen- 
erated, one can search, by the well known method 
|p4| p8| of perceptron learning, for a set of w for 
which Eq. (j^) holds for all maps from this library. 

Even though ideally Eq. (||) should be satisfied for 
any sequence a of amino acids, existing in nature or 
synthesized, it is not clear at all that it is possible 
to find a set w for which (|J) holds for even a sin- 
gle protein. The reason is that as we have recently 
shown |^9| , the number of physically realizable con- 
tact maps is exponential in the length N of the pro- 
Thus for a short protein (with, say, N = 40), 
implies that about 2 40 conditions should be 
satisfied by tuning 210 parameters! 

We believe that this is a highly relevant question; 
clearly the true potential (which, of course, is far 
more complex than our Eq (|J)) is able to fold all 
natural proteins: there should be a potential of in- 
termediate complexity between the true one and the 
simple contact energy we are testing here, which is 
able to fold, say, a family of proteins. This work is a 




1 The number of self avoiding walk configurations is also 
exponential, albeit with a larger coefficient of N in the 
exponent 
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first step towards developing a methodology to test 
any such potential. 

It is important to realize that the conditions we 
try to satisfy are much more stringent than the one 
usually required for successful threading p7| , pO| , pT| . 
We did succeed [B2| to find a set of contact ener- 
gies w that satisfies Eq. (j^) simultaneously for a large 
family of proteins, provided the decoy maps S c were 
obtained by (gapless) threading. The reason is that 
a contact map obtained by threading is. usually, a 
rather poor guess for the native fold [ |17|j3C|j3l|| (see 
Fig. [l] of energy histograms of threading and mini- 
mization). 
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FIG. 1. Histograms that demonstrate the difference 
in energy between ensembles of contact maps obtained 
by threading and by energy minimization. We used 4 
sets of contact energy parameters: HL, Hinds and Levitt 
@; MD, Mirny and Domany §]; MJ, Miyazawa and 
Jernigan Q; MS, Mirny and Shakhnovich Q. 

Since one cannot perform an exhaustive search of 
all physical maps, we must generate a large ensemble 
of alternative contact maps of low energy. We will 
assume that only such a subset of contact maps is 
effectively in competition with the native one to be 
the ground state and only these maps gives rise to 
relevant constraints in Eq.(|J). 

Our strategy and the outline of this paper are as 
follows: 

• Generation of alternative conformations: in 
Sec II we outline briefly the manner in which 
such a set of low-energy alternatives is gener- 
ated. Details of this method will be presented 
in a separate publication |uj |. 



• Learning of a set of pairwise contact energy 
parameters: in Sec. Ill, we present the way 
in which we use these contact maps to "learn" 
the energy parameters; the results obtained for 
a single protein, crambin, are in Sec IV. 

• Our results are summarized in Sec V, where we 
also discuss perspectives and future directions. 

We chose crambin as the particular protein to 
study since it has a long standing history in protein 
foldin g si mulation investigations. Wilson and Do- 
niach |33| used a simplified model in which the con- 
formation of the backbone and side chains is speci- 
fied by dihedral angles and contact energies are cal- 
culated from the distribution of pairwise distances 
observed in known experimental structures. Among 
other results, they were able to correctly reproduce 
the formation of secondary structures and many of 
the features of the hydrophobic core. Kolinski and 
Skolnick |m| performed accurate Monte Carlo simu- 
lations using a detailed lattice representation, opti- 
mized for the prediction of helical proteins. In their 
model, side chain rotamers were explicitely repre- 
sented by additional single monomers. The energy 
function, mostly of statistical origin, contained sev- 
eral terms to help the cooperative assembly of sec- 
ondary structures and the packing of the side chains. 
On average, their simulation runs ended up in con- 
formations with the correct topology of the native 
fold, and a RMSD distance of 3 A from the native 
C Q trace . Hao and Scheraga [ p0|pl| ] showed, by opti- 
mizing an extended set of energy parameters, that it 
is possible to fold crambin within 1-2 A RMSD from 
the native state. Their conclusion is, however, that 
it is always possible to find structures with lower 
energy than the native state. 

It should be borne in mind that within the contact 
map representation, conformational fluctuations, as 
measured by RMSD, amount to 1.1 A for crambin. 
This result is obtained by constructing 1000 struc- 
tures, following the method described in Ref. Q , all 
with contact maps identical to the native one, and 
averaging their RMSD values. 

Within the contact energy model, existing sets of 
contact potentials perform very poorly in a com- 
puter experiment of folding crambin. We demon- 
strated this by performing a Monte Carlo minimiza- 
tion, using these potentials, starting from the exper- 
imentally known native structure. For all the con- 
tact potentials tested this procedure identifies easily 
conformations that are very different from the na- 
tive one, and of significantly lower energy (see Fig. 
|). This clearly proves that for these potentials en- 
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ergy minimization will lead to non-native states and 
also demonstrates that our minimization procedure 
yields maps of much lower energy than those ob- 
tained by threading. 

II. GENERATION OF ALTERNATIVE 
CONFORMATIONS 

Crambin J3(| is a protein of length N — 46; its 
native map, constructed by taking the coordinates 
of the C a atoms from the PDB and using a threshold 
of 8.5 A to define contacts, is shown in Fig.pl. 
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FIG. 2. Contact map for the native state of crambin. 
Dark dots represent contacts (Sij = 1). There are 187 
non-nearest neighbors contacts. 

Our aim is to generate maps of energy low enough 
to "compete" with the native contact map. These 
alternative candidate maps should be markedly dif- 
ferent from one another, e.g. have a large relative 
Hamming distance ||] 

£> map = XJl^'-^l- (4) 

j>i 

The number of contact maps that can be actually 
sampled in any reasonable time is a negligible frac- 
tion of the 0(e 46 ) physical maps. Therefore, mov- 
ing in an efficient way in the space of physical con- 
tact maps is an essential component of our program. 
Clearly, by turning existing contacts off and non- 
existing ones on, we can generate large scale moves of 
the polypeptide chain; moves that would have taken 
very long time to accomplish in real space. This is 



one of the most attractive features of working with 
contact maps. There are two main problems with 
doing this. First, if one selects at random the con- 
tacts that are to be modified, chances are that the 
resulting maps will not be physical : that is, there 
exists no real polypeptide chain conformation whose 
contact map is the one we found. The second prob- 
lem is that we would like to work with moves that do 
not destroy secondary structure elements (a-helices 
and /3-sheets). 

As can be seen from Fig.||, these appear as clusters 
of non-zero entries in the map. Thick bands along 
the principal diagonal represent a helices; the small 
antiparallel (3 sheet which characterizes the native 
fold of crambin appears as a thick band, perpen- 
dicular to the principal diagonal, whose two strands 
extend from amino acids 1 to 4 and from 32 to 35, re- 
spectively. A contact map is roughly characterized 
by the number and the respective positions of its 
secondary structure elements. A typical native map 
has, in addition, isolated entries (single contacts or 
small clusters) that contain information about the 
global fold and relative positions of the secondary 
structure elements. 

We present here only a short description of our 
Monte Carlo method; for a more detailed exposi- 
tion we refer the reader to Ref. [|nj. Our algo- 
rithm consists of three steps. The first step consists 
of non-local moves. We start by identifying "clus- 
ters" of contacts in an existing map. These clus- 
ters represent cither a-hcliccs or /3-sheets (parallel 
or anti parallel) , or small groups of contacts between 
amino acids that are well-separated along the chain. 
The clusters are identified on a given map by lay- 
ing down bonds that connect neighboring contacts 
on the map and identifying clusters of contacts that 
are connected by such bonds Q . Some of the exist- 
ing clusters of contacts are removed and some other 
groups are restored elsewhere. This way secondary 
structure elements are destroyed and recreated at 
different locations and orientations. The "energy" 
of the resulting coarse map is evaluated and a low 
energy map is retained. This map serves as the start- 
ing point for the second step: local moves. This is 
a refinement procedure that consists of turning on 
or off (mostly one at a time) contacts that are in 
the vicinity of existing ones, following some of the 
rules introduced in Ref. 0. Again, only moves that 
lower (or do not significantly raise) the energy are 
accepted. These first two steps are fast operations, 
since they involve binary variables. Most of the com- 
puter time is taken by the third step, reconstruction, 
where we deal with the major problem of ensuring 



4 



that we stay in the subspace of physical maps. 

A generic contact map is not guaranteed to cor- 
respond to any real conformation of a polypeptide 
chain in space. To solve this problem, we developed 
an efficient Monte Carlo reconstruction method that 
checks whether any given target map is physical or 
not [|[. This is done by working with a string of 
beads that represents the backbone of the polypep- 
tide chain. The beads are moved around without 
tearing the chain and without allowing one bead to 
invade the space of another. The motion of this 
string is controlled by a "cost function" which van- 
ishes when the contact map of the string coincides 
with that of the target map. The cost increases when 
the difference between the two maps increases. This 
procedure ends up with a chain configuration whose 
contact map is physical by definition and close to the 
target map. Thus we are able to efficiently "project" 
any map that we have generated in the first two steps 
onto the subspace of physical maps. 

Having described the manner in which a single 
low energy chain and its corresponding map are ob- 
tained, we turn to describe the manner in which 
we generated a representative ensemble of contact 
maps, to be used in the derivation of contact energy 
parameters. In general, one expects to have two in- 
terplaying levels of optimization. On the one hand, 
one has to satisfy Eq.([|) for contact maps that are 
very different from the native one. On the other 
hand, with the same set of energy parameters, one 
should be able to discriminate between the native 
contact map and those maps that are close to it. 
We generated conformations close to the native one 
by running a series of Monte Carlo minimizations, 
starting from the native state. This procedure was 
not carried out in contact map space, but rather by 
using a local Monte Carlo procedure on the back- 
bone or chain of beads described above, with the 
position of each bead defined in real space; the ele- 
mentary move of this procedure is of the crankshaft 
type pq | . Each minimization consist of Nlmc such 
Monte Carlo steps and yields a chain and its low- 
energy candidate contact map. A move is accepted 
according to the Metropolis prescription at a given 
fictitious temperature Tlmc- The procedure is then 
repeated, starting again from the native state but us- 
ing different random numbers and generating a dif- 
ferent map. We call this procedure D\\ it generates 
an ensemble of low energy maps that are (relatively) 
close to the native fold. 

To generate conformations far from the native one, 
we use the three-step Monte Carlo method described 
above, supplemented by a fourth step; a further real 



space Monte Carlo minimization, as in procedure 
D\. That is, the three-step algorithm that performs 
global moves in contact map space ends with a chain 
conformation; this is used as the initial state of the 
Di procedure (again using Nlmc local steps) . Each 
search gives rise to a low energy contact map. Such 
a contact map is used as the staring point for the fol- 
lowing global search. This second procedure, called 
Z?2, generates a set of low energy maps that are very 
different from the native fold. In Fig. || we show 
energy versus Hamming distance of typical contact 
maps obtained by procedures D\ and D^. 
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FIG. 3. Energy E versus Hamming distance 75 ma P for 
1000 contact maps obtained by procedures Di and Di. 
The contact maps are generated using the energy param- 
eters of epoch t = 9 (see next section) and Nlmc = 10 6 . 
The line corresponds to the energy of the native contact 
map of crambin. 

The energies and maps obtained by the two 
minimization procedures depend strongly on the 
length of the run. As a part of our strategy, 
we repeated procedures D\ and D2 for Nlmc — 
10 3 , 10 4 , 10 5 , 10 6 , 10 7 local steps. By combining the 
two sets of contact maps derived by the two min- 
imization procedures, we obtained a representative 
set of P contact maps. 

III. LEARNING CONTACT ENERGIES 
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A. Derivation of contact energies as a learning 
problem 

In the crambin chain, 5 out of the 20 amino 
acids do not appear and 3 appear only once. Thus, 
among the corresponding 210 possible contact ener- 
gies, only 117 parameters can effectively enter the 
energy (||) for any set of candidate maps. These pa- 
rameters form a 117-component vector w. The na- 
tive map of Fig]| contains 187 non-nearest neighbor 
contacts, which involve only 72 of the 117 possible 
contact energy parameters. We now show that for 
any map S M the condition Eq.(^|) can be trivially 
expressed as 



> 



(5) 



To see this just note that for any map S M the energy 
(|J) is a linear function of the 117 contact energies 
that can appear and it can be written as 



£?(a,S„,w) 



117 

E 



N c (S,j,)w c 



(6) 



Here the index c = 1,2, ...117 labels the different 
contacts that can appear and jV c (S M ) is the total 
number of contacts of type c that actually appear in 
map S M . The difference between the energy of this 
map and the native one is therefore 



AE„ 



117 



(7) 



where we used the notation 



is satisfied for every example from a training set of 
P patterns, x^, fj, = 1, . . . , P. If such a w exists for 
the training set, the problem is learnable; if not, it is 
unlearnable. We assume that the vector of "weights" 
w, as well as all examples x„ are normalized, 



w • w = x M • x p = 1 



(10) 



The vector w is "learned" in the course of a train- 
ing session. The P patterns are presented cyclically; 
after presentation of pattern fj, the weights w are 
updated according to the following learning rule: 



|w+?jx M | 



if 

otherwise 



w ■ x p < 



(11) 



This procedure is called learning since when the 
present w misses the correct "answer" > for 
example /x, all weights are modified in a manner that 
reduces the error. No matter what initial guess for 
the w one takes, a convergence theorem guarantees 
that if a solution w exists, it will be found in a finite 
number of training steps. p^ , p5| . 

Different choices are possible for the parameter ri. 
Here we use the learning rule introduced in Ref. |g8) , 
since it allows, at least in principle, to assess whether 
the problem is learnable. The parameter r\ is given 
at each learning step by 



V = 



1/d 



i - Kid 



(12) 



where the parameter d (called despair) evolves dur- 
ing learning according to 



NJS, 



iV c (So) 



(8) 



and So is the native map. 

Each candidate map S M is represented by a vector 
x M and hence the question raised in the introduction 
becomes whether one can find a vector w such that 
condition (||) holds for all x M ? If such a w exists, it 
can be found by perceptron learning. 



B. Perceptron: Learning from examples 

A perceptron is the simplest neural network p4| . 
It is aimed to solve the following task. Given P 
patterns (also called input vectors, examples) x^, 
find a vector w of weights, such that the condition 



> 



(9) 



dt = 



7] 



y/i + 2rjh» + rf 



(13) 



Initially one sets d = 1. 

The training session can terminate with only two 
possible outcomes. Either a solution is found (that 
is, no pattern that violates condition (^) is found in 
a cycle), or unlearnability is detected. The problem 
is unlearnable if the despair parameter d exceeds a 
critical threshold [E8| 



d > d c = 



M M+1 

2 M - 1 



(14) 



where M is the number of components of w. 

Evidently, once the requirement posed in the in- 
troduction, Eq (|j|), has been expressed in the form 
(||) , the question whether it does or does not have a 
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solution reduces to deciding whether a set of exam- 
ples is learnable by a perceptron. Every candidate 
contact map (generated by the search procedure de- 
scribed above) provides a pattern for the training 
session. Note that the vector x defined in Eq. (|8|) 
must be normalized before ( |Tl| ) is used. Before turn- 
ing to present our results for the learnability of cram- 
bin using M = 117 contact parameters, we address 
the same question but use a much simpler potential, 
that of the HP model @ 

C. Can the HP model stabilize crambin? 

We tried to stabilize the native map of crambin 
using the parametrization of the HP model. This 
has an even simpler potential than the one we use; 
whereas we have 117 contact energies to tune, the 
HP model attempts to satisfy Eq. using only 
M = 3 parameters, w\,W2,Ws- The perceptron 
learning procedure detected clearly and unambigu- 
ously that this is an unlearnable problem. 

We relabeled the amino acids in the crambin se- 
quence following the usual partition into hydropho- 
bic (H) and polar (P) residues. Examples were gen- 
erated and then the perceptron learning procedure 
was applied to this problem. We measured the value 
of the despair d as learning progressed; the result is 
shown in Fig. [|. The critical value d c = 3 4 /2 2 was 
reached rather fast when we used only 306 examples; 
that is, we established that the problems is unlearn- 
able. 
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FIG. 4. Increase of the despair d with the number r 
of learning sweeps for the HP learning. 



This is a very important lesson from a simple 
model: with only 2 species of amino acids, the con- 
tact map representation is not suitable for folding 
crambin since it is impossible to find a set of contact 
energy parameters that satisfy Eq.Q. We regard 
the present work, which employs contact energy pa- 
rameters between the 20 existing species of amino 
acids (15 for crambin), as the first step in a progres- 
sive and controlled improvement of protein structure 
representation. 

The question we posed concerns learnability of an 
exponential number of examples, of which we can 
sample only a small subset. When 15 amino acids 
are used the problem is far from being as simple as 
the HP model; this is evident from the fact that a 
large number of examples, generated by threading, 
can be easily learned. Hence, in order to answer 
the question we posed it is important to choose a 
strategy that generates "hard" examples. Such a 
strategy is described next. 

D. Iterative Learning and Generation of 
Examples 

The contact energy parameters are learned in an 
iterative manner, i.e. examples are generated and 
then learned; the new w is then used to generate 
new examples and so on. We will refer to each such 
iteration as an epoch. At epoch t = we start from 
an arbitrary set Wo of parameters; we used those 
that were derived in Ref. || for the present C a repre- 
sentation. Using the procedure of generation of low 
energy conformations discussed above with these en- 
ergy parameters, we generated a set of Pq low energy 
contact maps. This completes epoch and we can 
start epoch t = 1. The Pq low energy contact maps 
obtained in epoch constitute the training set to 
learn new contact parameters, wi, according to the 
perceptron learning rule discussed in the previous 
section. Using the newly derived energy parame- 
ters, we generate a new set of Pi low energy contact 
maps. This set is added to the old training set so 
that now we have Pq + Pi examples, all of which are 
learned in epoch t = 2. In the present work this 
iterative procedure is repeated up to epoch 10. 

IV. ELUSIVE UNLEARNABILITY 
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A. Impracticality of despair 



We summarize here our main result about the 
question we have addressed in the present work. We 
will present below considerable evidence supporting 
our main conclusion: 

the problem of fine tuning the contact 
energy parameters to stabilize the native 
state of crambin is effectively unsolvable. 

By this we mean that the problem is either unlearn- 
able, or learnable with a learning time which ex- 
ceeds any realistic scale. We cannot give a clear- 
cut answer as we did for the HP model since the 
condition that should be met to establish un- 
learnability is numerically impractical; according to 
Eq.(||), for M = 117, the critical despair d c ~ 10 87 . 
Moreover, from Fig.|5| we see that in this particular 
problem, when we tried to learn P = 195124 ex- 
amples, the despair d increases logarithmically with 
the learning time r. This particular learning task 
took r = 606756 sweeps to be solved, and the final 
value of the despair was d = 4921.01. The size of P 
and tP and r is strikingly larger than those involved 
in the HP case, where t — 2 sweeps and P = 306 
examples were enough to obtain an unambiguously 
negative answer, (in that case after r = 7 the despair 
was d > 10 30 ). 

This is also in contrast to perceptron learning of 
P > 2M randomly generated examples (which is an 
unlearnable problem |^6|,^7j for large M); there d 
grows exponentially with learning time |28j| . Since 
the time that would be needed to exceed the criti- 
cal despair in our particular problem is beyond any 
reasonable estimate, we have to resort to alternative 
non-rigorous ways to test learnability of this task. 
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FIG. 5. Increase of the despair d with the number 
r of learning sweeps for a typical case of learning dealt 
with in this work. 



B. Generalization error 

As explained above, the parameters w that were 
obtained after a new epoch solve a larger training set 
of examples and, hence, they may well be a "better 
solution" of the problem. The quality of any solu- 
tion w is measured by the generalization error e g . 
To determine it, one generates a set of new exam- 
ples that were not used in the training procedure; 
e g is the fraction of examples for which w produces 
the wrong answer and it should decrease when the 
training set is increased. 

In the context of our problem we generate at epoch 
t, using the procedure described above with the cur- 
rent energy parameters wj, a set of Pt of low-energy 
contact maps. s g (t) is the fraction of those contact 
maps that violate Eq.(||) and hence have lower en- 
ergy than the native map. The dependence of e g (t) 
on the epoch index t is shown in Fig. ^. 

Initially e g (t) decreases drastically with t. We 
used several of the existing knowledge-based contact 
potentials |?],[l^;[l3|,^2| as our starting energy param- 
eters wo; the fact that e s (0) ~ 1 signals that these 
potentials fail completely the test of assigning the 
native map an energy that is lower than that of maps 
obtained by our minimization procedure. This is to 
be compared with the good performance of the same 
potentials on testing the native fold against maps ob- 
tained by threading [ |32[ , highlighting the point made 
in the Introduction, that stabilizing the native map 
against our low-energy decoys is a much more dif- 
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ficult challenge than stabilizing it against maps ob- 
tained by threading. 

With increasing epoch index, however, the gener- 
alization error does not level out at zero; rather, it 
fluctuates at the level of 0.2 - 1 percent. Complete 
learning is elusive; this behavior indicates that the 
problem is, probably, unlearnablc. 




FIG. 6. Generalization error e s (t) of conformations 
with energy lower that the native state as a function of 
the epoch t. 



C. Learning time 



we repeat the previous procedure, randomly select- 
ing Nl training sets of P = nAP patterns in each 
and compute the average learning time as before. In 
the data shown in Fig. ^| we used Nl = 6 at epoch 
t = 9. We followed this random selection procedure 
to eliminate all possible dependence of the learning 
time on the epoch index, isolating the variation of r 
with the size of the training set. 

The observed increase of the learning time with P 
is compatible with a divergence of (r) at some finite 
P c ~ 5 • 10 5 , again indicating unlearnability. 
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FIG. 7. Inverse average learning time l/(r) as a func- 
tion of the inverse number 1/P of examples. 



Further evidence supporting unlearnability comes 
from the way the learning time r increases with the 
size of the training set. r is defined as the number of 
sweeps through the entire training set that is needed 
to learn all the P examples. In an unlearnable case 
there exist sets of examples for which no solution 
can be found; for large enough P the training set 
will include, with non- vanishing probability, such an 
unlearnable subset. This means that the learning 
time r diverges for a finite P. We show in Fig. [t] the 
average inverse learning time l/(r) as a function of 
the inverse number 1/P of examples. The curve was 
obtained as follows. At the end of our last epoch 
we collect all P tot contact maps that have been ac- 
cumulated so far (during all epochs). Of these we 
randomly select a subset of AP maps and compute 
the learning time r. This process is repeated Nl 
times, each time selecting a different set of AP ex- 
amples, (t) is the average learning time of these Nl 
learning processes. To study how the learning time 
t increases with the number of training examples, 



V. ANALYSIS OF THE CONTACT 
ENERGIES 

Even though the problem is apparently unlearn- 
able, our procedure produces contact energies that 
have several appealing features. One of these has 
been mentioned above: whereas for the existing con- 
tact potentials it is very easy to find maps whose 
energy is below that of the native map, with the w 
obtained after several training epochs this becomes 
a difficult (albeit possible) task (see Fig. [|). That 
is, the generalization error becomes very small. We 
present now some other features of the contact pa- 
rameters obtained by our learning procedure. 

A. Energies of the false ground states 

Another measure of the quality of the energy pa- 
rameters is given by the gap AE between the ener- 
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gies of the false ground states and that of the native 
map. A negative value of AE means that (0) is vi- 
olated. 

We found that the average |A25| of the violating 
examples decreases with the epoch index, see Fig. 
||. Hence our learning procedure flattens the energy 
landscape, reducing both the number of violating 
examples and their gap. Another relevant question 
concerns the "location" of these false minima, i.e. 
how different are the corresponding structures from 
the native one? To study this, we reconstructed 
the three dimensional conformations corresponding 
to the violating examples and measured their aver- 
age RMSD distance from the native conformation. 
We found that the RMSD does not decrease with 
the epoch index; moreover, using procedure D2, false 
minima are found at an approximate average RMSD 
of 10 A at any epoch. Only their number decreased 
significantly. Hao and Scheraga pcfl , on the other 
hand, reported that the distance of their false ground 
states from the native conformation did decrease as 
their energy parameters became better optimized. 
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FIG. 8. Energy gap (AE) of the violating examples 
at successive epochs. 

B. Energy distribution at successive epochs. 

As already observed, with the initial energy pa- 
rameters the vast majority of the contact maps that 
are generated have an energy lower than the native 
contact map. As can be seen from Fig. ^, where 
the energy scale is shifted so that the native contact 
map has always zero energy, for increasing epoch in- 
dex, the energy distribution shifts to the right and 



becomes narrower. Learning is thus accompanied by 
an improvement of the Z-score, which is a commonly 
used estimator of the quality of a set of energy pa- 
rameters [ p2| . 
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FIG. 9. Normalized histogram H(E) of the energies 
of the contact maps at epochs t = 0, 1, 2 and 10. The 
energy scale is shifted so that the native contact map 
energy is 0. 

C. Allowed region in parameter space 

The vector w of energy parameters lies on the 
surface of a unit sphere in the N w — 117 dimen- 
sional parameter space. Each example introduces 
a hyperplane which restricts the allowed vectors to 
one of its sides. The region which satisfies all the 
P constraints is called version space. All vectors w 
that lie within version space are compatible with the 
constraints given by Eq. (||) and, therefore, are so- 
lutions of the learning problem. As more examples 
are added, version space may shrink - if the problem 
is unlearnable, the (relative) volume of version space 
shrinks to zero. 

To estimate the size of version space we gener- 
ated an ensemble of solutions by the following Monte 
Carlo sampling. At each epoch we arrive by percep- 
tron learning at a particular solution w that satis- 
fies the set of P examples which define the current 
version space. Starting from this solution, we per- 
formed a random walk on the surface of the unit 
sphere of w vectors. The elementary step is to 
choose at random a component k of w and to change 
it, 
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w k = Wk + £ 

where £ is a random displacement. The new array 
of weights is kept (and normalized) if it is still a so- 
lution, otherwise it is rejected. This updating rule 
is repeated many times, and eventually a sizeable 
number of different solutions is obtained. Next we 
perform a principal component analysis of the co- 
variance matrix of this ensemble of solutions. The 
covariancc matrix is defined as 



Cij = ((wi - (wi))(wj - (wj))) 



(15) 



where (•) denotes averages taken over the ensemble 
of solutions. Let A, > and Vj be the eigenvalues 
and the corresponding eigenvectors of the covariance 
matrix. Clearly, <jj = vAj is the standard deviation 
which measures the spread of our ensemble of solu- 
tions w along direction Vj. If we observe A, — * 0, 
this means that along the corresponding direction 
the width of version space has shrunk to zero. The 
projections of our vector of energy parameters along 
these directions are fixed and cannot be changed. 
As shown in Fig. 



10 



the number L of directions 
whose corresponding eigenvalues approach zero in- 
creases with the epoch index t (to about half the 
number of directions) . 

We also checked whether the directions that are 
constrained do or do not change with the epoch in- 
dex and found that these directions become con- 
served. The check was performed by measuring for 
t > 6 the variance of the parameters along the direc- 
tions that were locked at epoch t = 6. Hence further 
optimization of these parameters to fold other pro- 
teins is ruled out. 




Another aspect of the solutions derived by learn- 
ing is their convergence. We calculated, after every 
epoch t, the average solution (i.e. the center of mass 
of version space), w t . The overlap fl — w t • w t+ i of 
such average solutions, measured after two succes- 
sive epochs t and t + 1, increases with t to a value 



very close to 1 (see Fig. 11). This indicates that the 
vector wt converges for large ts to some particular 
direction. 




FIG. 10. Number L of locked directions in parameter 
space as a function of the epoch t. 



FIG. 11. Convergence of the 

scalar product fi = Wj • w f+ i in the learning process, 
as a function of the epoch t. 



VI. CONCLUSIONS 

One of the simplest and more widely used forms 
for the energy of a protein is the contact approxima- 
tion, (see Eq.(^)). Although such approximations 
have contributed a lot to our understanding of the 
general properties of the folding transition, it is far 
from clear which are its intrinsic limitations if the 
task is to actually predict native conformations from 
protein sequences. 

To give a clear-cut solution to this problem, in this 
work we have posed a remarkably simple question: 
Is it possible to optimize a set of contact energy pa- 
rameters such that the native contact map of a single 
protein has the lowest energy among all the possible 
alternative contact maps? 

We have reached the conclusion that the deriva- 
tion of an such a set is at the edge of learnability, 
even in the case of one protein only. For any prac- 
tical purpose, the basic requirement that the native 
state of a protein should be the minimum of some 
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effective free energy, cannot be fulfilled if a contact 
approximation is used. 

We stress the obvious fact that learning a set of 
contact energy parameters for more than one protein 
is necessarily more difficult, and would not change 
our conclusion. 

Our conclusion is supported by substantial evi- 
dence: 

1. If only two species of amino acids are used, 
the unlearnability of the task is unambiguously 
shown. 

2. When 15 species of amino acids are considered, 
the time r needed to learn the set of energy pa- 
rameters that stabilize crambin increases dra- 
matically with the number P of alternative 
contact maps that are considered. Such in- 
crease of r is compatible with a divergence at 
a finite P. 

3. The generalization error e g does not asymptote 
to zero, rather it fluctuates around a finite, 
although small, value. 

4. The distance from the native state of contact 
maps that are found with energy lower than 
the native one does not decrease as the opti- 
mization is carried on. 

5. The allowed region in energy parameter space 
shrinks to zero along roughly a half of the to- 
tal number of directions. Thus, a further opti- 
mization of parameters along these directions 
is ruled out. 

Even within a contact energy framework, more ac- 
curate and possibly more successful approximations 
are possible. For example, an all-atom based defini- 
tion of contact instead of one based on the C a only, 
could be expected to improve the quality of the pre- 
diction. We regard, however, the results presented 
here as a first step towards a systematic improve- 
ment of the approximation of the energy function 
to be used in folding predictions. In planned future 
work, the simple form of the energy used here will 
be supplemented with the inclusion of additional en- 
ergy terms, such as hydrophobic (solvation), hydro- 
gen bond or multi-body interactions. 

A different question, which is not in the scope of 
the present work, is how does a set of contact energy 
parameters derived by perceptron learning compare 
with other existing sets. We have addressed this 
problem by learning together 153 different proteins, 
and considering alternative conformations generated 
by gapless threading ]32] |. 



This last issue is connected with the possibility to 
perform predictions that do not rely completely on 
energy minimization alone. For example, the con- 
dition that the native state should be the absolute 
minimum of the energy function can be relaxed. In 
such "weak" prediction, a short list of candidates is 
identified and used as starting point for a successive 
selection. 
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